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The Wechsler Intelligence Scale for Children-Revised 
(WISC-R) , one of the most coimnonly used tests of cognitive ability, 
is difficult to administer accurately. The purpose of this study was 
primarily to assess iuterrater agreement on the WISC-R Administration 
Observational Checklist (WAOC), a new observational instrument that 
can be used by an observer to evaluate all components of WISC-R 
administration. A secondary purpose of the study was to evaluate two 
WISC-R administrations of five students enrolled in a graduate course 
in psychoeducational assessment. Based on a total of 10 observations 
by two raters, Cohen *s Kappa was calculated for 29 of the measures on 
the checklist. The values for 22 of the measures were significant (p 
greater than .05). The difference in mean scores for the first and 
second observations of the students did not quite reach statistical 
significance because of the small number of subjects and because of a 
ceiling effect for one student who scored highest on the first 
observation. However, after receiving feedback, students showed 
improvement on a number of the measures. WAOC enables the observer to 
pinpoint examiner errors and to give specific feedback regarding 
those errors. (Author/JAZ) 
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Abstract 

The purpose of the present study was primarily to assess 
interrater agreement on the WISC-R Administration 
Observational Checklist (WAOC) and secondarily to evaluate 
two WISC-R administrations of five students enrolled in a 
graduate course in Psychoeducational Assessment. Based on 
total of 10 observations by two raters, Cohen's < was 
calculated for 29 of the measures on the checklist. The 
values for 22 of the measures were significant (£ > .05). 
The difference in mean scores for the first and second 
observations of the students did not quite reach 
significance, though after receiving feedback, students 
showed improvement on a number of the measures. 
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Development of the WISC-R Administration 
Observational Checklist 
Because the results of a test are only as accurate as 
its administration, a test must consistently be administered 
according to its standard directions in order to keep 
administration error as a source of error variance at a 
minimum. The Wechsler Intelligence Scale for Children- 
Revised (WISC-R) , one of the most commonly used tests of 
cognitive ability, is also one of the more difficult to 
administer accurately. Based on observations of WISC-R 
administration, Fantuzzo, Sisemore, and Spradlin (1983) 
determined that Comprehension and Vocabulary were the most 
difficult Verbal subtests to administer with the major source 
of error being failure to accurately probe ambiguous 
responses. Block Design and Picture Arrangement were the 
most inaccurately administered Performance subtests with the 
major errors involving departures from standardized verbal 
instructions and nonstandard manipulations of test materials. 
Another major error was lack of adherence to standardized 
presentation of digits on Digit Span. 

Attempts have been made within the context of assessment 
training to insure accuracy of Wechsler administration 
(Boehm, Duker, Haesloop, & White, 1974; Fantuzzo et al., 
1983). Neither of these existing approaches, however, 
makes possible the precise pinpointing of an examiner's 
administration errors. 

An new observational instrument, the WISC-R 
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Administration Observational Checklist, {WAOC, Stewart , 1984) 
can be used by an observer to evaluate all components of 
WISC-R administration, item by item. The WAOC is divided 
into 12 subtests each corresponding to one of the WISC-R 
subtests. In turn, each subtest contains either two or three 
sections. For each subtest the first section is used to 
assess an examiner's accuracy in using correct starting and 
stopping points, administering early test items, and 
following other general instructions on the test. The second 
and primary section for each subtest is used to evaluate the 
main body of the administration of the subtest. Using a 
"yes^/^no" format, this section is used to assess the 
accuracy of verbal directions, manipulation of test 
materials, and timing for every item administered. 
Appropriateness of the use of queries is also evaluated in 
this section. For certain subtests a third section is 
included to cover "special considerations" that typically 
occur infrequently during the course of the administration of 
the test (e.g., use of special prompts). To aid in the use 
of the checklist, exact directions from the manual are 
included within the context of the checklist. 

The purpose of the present study was primarily to 
establish the interrater reliability or the WAOC. In the 
process of doing so, the WAOC was used to evaluate the WISC-R 
administration of five students enrolled in a graduate course 
in Psychoeducational Assessment. 
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Method 

Subjects and Observers 

The subjects in the study were five graduate students 
enrolled in their first of two courses in Psychoeducational 
Assessment. The two observers in the study were the 
instructor of the assessment course (also the developer of 
the WAOC) and the graduate teaching assistant for the course, 
a student already having completed the course. 
Procedure 

At the beginning of instruction on WISC-R 
administration r all students were given a copy of the WAOC 
and were told that their test administration would be 
evaluated using the checklist. The course instructor and the 
graduate teaching assistant observed all students on their 
fourth formal administration of the test (i.e., to a child in 
a public school setting) . All observations were made 
simultaneously by the two observers in order to obtain 
interrater agreement data on the checklist. Subsequently, 
the instructor gave detailed feedback to each student 
regarding exactly where he or she had made errors and how 
those errors might be avoided on future administrations of 
the test. The students wer^ then observed by the instructor 
and teaching assistant during their seventh formal 
administration of the test. For the second observation the 
students were asked to test a child of the same age as the 
one tested during the first observation in order to keep the 
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items administered as similar as possible for the two 
evaluations. 

Results 

In order to evaluate interrater agreement taking into 
account the proportion of agreement due to chance, Cohen's 
(1960) k: vas calculated for each of the 29 measures used to 
evaluate the main body of the test administration (i.e., 
reading of directions, manipulations, timing, etc.). 
Agreement scores were based on the sum of all cases (i.e., 
observations on individual test items) across 10 observation 
sessions for each of the 29 measures. 

The K value for each of the 29 measures as well as 
simple proportion of agreement scores (i.e., to provide for 
comparison) are reported in Table 1. The ^ values for 20 of 
the measures were significant at > .01 and 2 were 
significant at £ > .05. Of the 7 measures that were not 
significant, 5 were measures of timing. The low interrater 
agreement on tining was primarily due to difficulty in 
determining when the examiner was actually starting and 
stopping the watch. Agreement on the reading of digits for 
Digit Span was not significant because only one error was 
made in the 163 cases observed, and the raters were not in 
agreement on that case. The < value for the reading of 
Vocabulary also was not significant and suggested a need for 
further clarification of how that subtest would be evaluated. 
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Insert Table 1 about here 



Mean scores for the first observations and the second 
observations based on the 29 measures were 68.4% and 83.1%, 
respectively. While this difference did not quite reach 
statistical significance, there was a trend toward 
significance [L{4) = -2.34, & = .08]. The fact that 
statistical significance was not achieved was not surprising, 
however, because of tho small number of subjects and because 
of a ceiling effect for at least the one subject who scored 
highest on the first observation. 

During the initial observations, errors were 
particularly common in the reading of the directions for 
Picture Arrangement, Block Design, and Coding; in the 
manipulations accompanying Picture Arrangement, Block Design, 
and Coding; and in the speed of reading the digits on Digit 
Span. In addition, the students often had difficulty with 
timing, apparently due to lack of practice in holding, 
starting, stopping, and resetting their stopwatches. 
Improvement was made in all of these areas by the time of the 
second set of observations. Errors in querying were most 
common on Similarities, Vocabulary, and Comprehension. While 
improvement was made on Similarities and Comprehension by the 
second observations, many errors were still occurring on 
Vocabulary. Errors in the cautions for Picture Completion 
and repetitions for Arithmetic were frequently made during 
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first Observations but seldom occurred during the second 
observations. 

Discussion 

Interrater agreement on the various observation measures 
on the WAOC was generally high. In general, the data from 
this study support its usefulness as a training tool. 

The WAOC made possible giving the students specific 
feedback regarding errors after the first observations. 
During the second observations, students showed improvement 
in their test administration, correcting many of. the errors 
made during the first observations. Further research is 
necessary, however, to determine whether or not feedback 
based on the WAOC is more facilitative than less systematic, 
less objective feedback. 

The WISC-R Administration Observational Checklist has 
been developed as an aid to help increase accuracy of 
administration of the WISC-R. ihis goal is based on the need 
to keep administration error at a minimum in order to 
maximize the reliability and validity of the test results. 
The WAOC enables the observer to pinpoint examiner errors and 
in turn to give specific feedback regarding those errors. 
Although other strategies for improving administration of the 
WISC-R have been attempted (e .g . ,Fantuz2c , et al. 1983), the 
WAOC makes possible giving more precise and detailed feedback 
than other ?pproaches that have been used. 
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Table 1 

LfiXfii qL Cgngruenge Between Two EatfiJLa ffljL tll£ WISC-R 
ftdministration Observahional Checklist Subscales 



Measure 
Reads 

I 

S 

PA 

A 

BD 

V 

OA 

COUP 

CD 
DS 
M 

Manipulates 
PA 
BD 
OA 
CD 
N 



200 
151 

61 
125 

36 
225 

50 
143 

30 
163 

38 

61 
36 
50 
30 
35 



Proportion 
q£. Agreement 



1.00 



.93 
.94 
.83 
.98 
.90 
.96 
.77 
.99 
.97 

.82 
.86 
.80 
.83 
.83 



Cohen 's 



1.00 
.77 
.87 
.63 
.65 
.59 
.72 
.55 
.51 
.00 
.94 

.77 
.70 
.51 
.66 
.58 



Kappa 

Probability 



.05 
.01 
.01 
.01 
.01 
.06 
.01 
.04 
.01 
1.00 
.01 

.01 
.01 
.01 
.01 
.01 
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Table 1 (cont.) 

Leve l qL Congruence Between Tmsi Eat^rfi. ffir ths. wisc-r 
Administration Qbservational checklist subscaies 



Measure i qL Q^s&a. 

starts Time 

PA 105 

A 112 

BD 80 

OA 40 

M 66 

Stops Time 

PA 105 

A 112 

BD 80 

OA 40 

N 66 

Other 

PC (time) 205 

BD 79 
(scrambles) 

DS (speed) 163 



Proportion 
q£ Agreement 



.83 
.95 
.84 
.98 
.85 

.87 
.92 
.83 
.80 
.85 

.91 
.87 

.74 



Cohen 's 
Ka£Ca. 



.44 
.70 
.57 
.79 
.36 

.62 
.60 
.20 

.32 

.52 

,26 
,70 

12 



Kappa 

frobability 



.01 

.01 
.01 
.07 

.11 

.01 
.01 
.34 
.20 
.01 

.17 
.01 

-01 
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